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ON  TESTING  MONOTONE  TENDENCIES 


Richard  L.  Dykstra  and  Tim  Robertson 


SUMMARY 


In  certain  problems,  it  may  be  expected  that  a  regression  function  has 
a  substantial  overall  tendency  to  be  monotone  and  yet  we  may  not  be  certain 
that  all  of  the  restrictions  imposed  by  a  simple  order  are  satisfied.  Dis¬ 
tribution  theory  for  likelihood  ratio  tests  of  homogeneity  of  a  collection 
of  normal  means  when  the  collection  is  "decreasing  on  the  average"  and  for 
testing  "decreasing  on  the  average"  as  a  null  hypothesis,  is  presented.  The 
restriction  "decreasing  on  the  average"  is  less  restrictive  than  the  usual 
monotone  restriction  and  allows  the  data  to  give  rise  to  "reversals"  over 
short  ranges  of  values  of  the  parameter  set.  It  is  closely  related  to  the 
"starshaped  ordering"  restriction  discussed  in  Shaked  (Ann.  Statist . 
(1979)). 
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INTRODUCTION  AND  SUMMARY. 


The  detection  of  a  monotone  relationsip  between 


two  variables  is  an  important  problem  in  statistics.  One  approach  to  such 
problems  is  to  base  a  conclusion  about  such  a  relationship  upon  an  estimate 
of,  or  a  test  of  a  hypothesis  about,  a  repression  function.  Procedures  for 
making  inferences  about  parameters  which  are  known  or  suspected  to  satisfy 
a  trend  have  received  considerable  attention  in  the  statistical  literature 
and  a  comprehensive  treatment  of  much  of  the  early  work  done  on  order 
restricted  inference  is  given  in  Barlow,  Bartholomew,  Bremner  and  Brunk 
(1972). 

In  certain  problems,  it  may  be  expected  that  the  regression  function 
has  a  substantial  overall  tendency  to  be  monotone  and  yet  we  may  not  be  cer¬ 
tain  that  all  of  the  restrictions  imposed  by  a  simple  order  are  satisfied 
(this  point  is  made  on  page  165  of  Barlow  et  al.  (1972)).  For  example,  it 
is  generally  believed  that  mortality  rates  increase  with  age.  The  ages  15 
through  35  crude  mortality  rates  per  10,000  insured  male  lives  are  shown  in 
Figure  1.  This  data  is  taken  from  the  1973  Reports  of  Mortality  and  Mor¬ 
bidity  Experience  of  the  Transactions  of  the  Society  of  Actuaries.  The 
rates  are  the  1965-70  ultimate  experience  and  are  based  upon  approximately 
35,000  insured  lives  in  each  age  group  so  that  the  standard  error  should  be 
roughly  1.7  per  10,000  lives.  The  hypothesis  that  the  mortality  rates  tend 
to  increase  with  age  seems  to  be  confirm'd  by  the  data  in  Figure  1.  However, 
it  is  not  at  all  clear  that  the  underlying  regression  function  is  strictly 
increasing  over  this  range  of  ages.  In  fact,  actuaries  now  believe  that 
there  is  a  "bump"  in  the  mortality  rat  ■  at  about  age  20  and  that  the  mor¬ 
tality  rate  actually  decreases  from,  roughly ,  ages  20  through  25.  The 
1965-70  graduations  were  among  the  first  to  reflect  this  "bump." 


Figure  1.  Crude  Mortality  Rate?  Per  10,000  leisured  Male  Lives. 
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The  "usual"  order  restricted  inference  procedures  do  not  allow  the  data 
to  give  rise  to  such  "bumps"  and  the  least  squares  order  restricted  estimate 
of  the  mortality  rate  based  upon  this  data  is  actually  constant  from  ages 
19  through  28.  These  considerations  suggest  that  inference  procedures  which 
account  for  a  somewhat  less  restrictive  monotone  relationship  could  be  of 
value.  This  paper  is  an  account  of  our  studies  of  one  method  of  modeling  a 
monotone  relationship  which  allows  the  regression  function  to  go  counter  to 
the  overall  trend  over  short  ranges  of  values. 

Def :  Suppose  G  =  ( 0^ ,0?  ,  • • •  ,0^ )  is  a  vector  of  parameters  and 

w  =  (w^ ,w^ , • • • ,w^)  is  a  vector  of  positive  weights.  We  say  that  0  is 
"decreasing  on  the  average  from  the  left"  (DAL)  with  respect  to  w  provided 

eii02S‘"S\  where  6i  =  (^j-iWj)"1  ’  (Ej.i  ,k.  (1.1) 

Increasing  on  the  average  from  the  left  (IAL),  increasing  on  average  from  the 
right  (IAR)  and  decreasing  on  the  average  from  the  right  (DAR)  are  defined 
analogously. 

Clearly  if  0  s  0  S;  •  •  •  s  0  then  0  is  DAL  (also  IAR).  The  order 
12  k 

restriction,  DAL,  is  closely  related  to  the  "starshaped  ordering"  discussed  in 
Shaked  (1979).  A  vector  0  is  said  to  be  lower-star shaped  if,  in  addition 
to  (l.l),  6  ^  0.  Shaked  (1979)  studies  estimates  of  normal  and  Poisson 

means  subject  to  the  restriction  that  they  are  starshaped  and  gives  examples 
from  reliability  theory  and  from  branching  processes  in  which  such  orderings 
are  of  interest. 

In  Section  2,  assuming  that  0  is  a  vector  of  normal  means,  we  derive 


the  restricted  maximum  likelihood  estimate  of  0.  Our  restrictions  are 


It 


somewhat  more  general  than  those  considered  by  Shaked  (1979),  in  that  we 
allow  ,  and  "i"  to  be  intermixed  in  the  restriction  (l.l).  Our 

method  of  derivation  is  different  from  that  of  Shaked  and  leads  to  the  dis¬ 
tributions  of  likelihood  ratio  statistics  for  tests  where  either  the  null 
or  alternative  hypothesis  requires  that  the  parameter  vector  is  monotone  on 
the  average.  These  hypothesis  tests  are  discussed  in  Sections  3  and  U. 

The  test  statistics  have  null-hypothesis  distributions  which  are  mixtures 

—2 

of  chi-square  or  beta  distributions — the  so-called  chi -bar-square  (x  ) 

—2 

and  E-bar-square  (E  )  distributions. 


2.  RESTRICTED  MAXIMUM  LIKELIHOOD  ESTIMATES.  Suppose  Y  ,Y  , • • •  ,Y  are 

_  _  T  "  "  r1  r  1  K) 

2 

independent  random  variables  and  that  Y^  ~n(0^,aiO  )  where  a 


are  known  positive  constants.  Let  w.  =  (a.O  )  and  W.  =  Z^  ,w  ;  i  = 

ill  J=1  J 

•**,k.  We  wish  to  find  the  maximum  likelihood  estimator  (MLE)  of 
0  =  ( subject  to  the  constraints 


1,2, 


0.  =  0.  (or  equivalently  0.  =  0.  );  4  =  1 ,2 , • • *  ,m ,  (2.1) 

4-1  x4-l  1l 


where  8.  =  W.^  -Z*  ,  w,8,  and  i ,<  i  <  ••’<!  is  an  ordered  subset  of 
i  i  J=1  j  j  12  ra 

{l,2,-,*,kj.  Using  the  constraints  (2.1)  to  write  0.  ;  j=l,2,"-‘,m  in 


terms  of  6.  for  4  £  [i,i,**’,i  3  we 

X  l  d  m 


obtain 


e,  -zLh.  zi1’1 


a=i  "ja^8=ia_1+i  wg6£ 


(2.2) 


-1  J-l  -1 

h  =  W.  .  n _ W.  •  W.  .. 

ja  i  -l  y=a  i  i  -1 


where 
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(We  adopt  the  convention  that  a  product  over  the  empty  set  equals  one  while 
a  sum  over  the  empty  set  equals  zero. )  Substituting  {? .2)  into  the  log- 
likelihood  function  and  equating  the  derivatives  to  zero  we  obtain  the 
equations 


ID  r  * 

(Vyj>  ■ 0:  J  *  (2-3) 

where  X(j)  =  X  iff  i  ,  ,  <  J  <  i  ..  If  we  define  y.  =  W.^L,  ,  w.y,  , 

*  11  j-i  J  J 

then  the  equations  (P.3)  have  a  surprisingly  tractable  solution. 


Theorem  2.1.  The  solutions  to  the  equations,  (2.3),  are  given  by 


-1 


(2.4) 


which  implies  that 


C1: 

1 

a  a  a  a 


e  •  +t  (y  -yt  .pw  W-  ;  b.  1,2, 

b  b  - 


(2.5) 


Proof :  We  begin  by  arguing  that  if  the  vector  0  satisfies  (2.4)  and  the 

restrictions,  (2.1),  then  its  values  for  .)  6  {i.  ,i„ , •  •  •  ,i  }  must  be  given 

1  d  m 

by  (2.5).  First,  note  that 

V  =  w?1  E  1  w  e 

i.  1.-1  J=i  .1  j 


=  y.  .  +  Z  (y.  -y.  ..  )w.  W7 

i,-l  b=l  1,  l.-l  1,  1, 
1  b  b  b  b 


=  y.  ,  [  1  —  w ,  W.  ]  +y.  w,  W.  +£  (y.  -y.  )w.  W. 

i,-l  1,  1,  1 ,  1,  1,  b=2  1,  1.  1,  1. 

1  11  111  bbbb 

=  yi,  +Ib=2(yi  -yiK)wi  Wi  • 

1  b  b  b  b 


.-1  . 


-1 


_ A 


We  proceed  by  induction.  Assuming  (2.5)  holds  for  b  and  noting  that 


E,  ,  w,6,  =  W.  0.  ,  we  consider 

j=l  j  J  ib  V 


e  =  w71  Jw.  6.  +  E^+1+Jw.§l 

xb+l  ib+l-1L1b  xb  J  1b  1  J  l1-l 


W. 1  ,  W.  y.  +w.  E  .  .Jy.  -y.  ,)w  W71 +W.  ,y.  .  -w.  y. 

i,  . -1  i,  i.  l,  a=b+l  l  l  -1  i  l  l.^-l  ix.,-,-!  k 
b+1  L  b  b  b  aa  aa  b+1  b+1  b  b 

+  (wiv  -i-wi }  Cb+i(yi  -yi  -i)wi  wlx] 


b+1  b 


a  a  a  a“ 


=  w.  Jw.  y.  .  +w.  E  .  .,(y.  -y,  n  )w.  w.  1 

i.  -1  i.  .,-1  i,  .t-1  i,  ..-1  a=b+l  i  l-l  i  i 
b+1  L  b+1  b+1  b+1  a  a  a  a-J 


*  ,  *  ra=b*2<yi  -yi  -1)U1  “l  ' 

b+1 


r1. 

i 

a  a  a  a 


Thus,  by  induction,  (2.5)  holds  for  b  =  1,2 ,• • • ,m. 

Comparing  (2.U)  and  (2.3)  it  suffices  to  show  that 


Z™ _/y .  -y.  Jw.  W71  =  Z™  .(y.  )h  .w. 

b=*  l,.  ix.-!  ix.  ix.  b=X  i,  i,  b,A  i, 

b  b  b  b  b  b  b 


(2.6) 


for  JL  =  l,2,'*',m.  The  equation,  (2.6),  holds  for  l  =  m  since 
6.  =  y.  ,  h  =  W71  ,  and 

l  i  ran  l-l 

mm  m 


<ri-Vi)vi St  ‘  (VyiJvC-lib=1’i>’" 

bb  bb  bbbb 


(2.7) 


In  order  to  establish  (2.6)  in  general,  we  use  induction.  Assume  that  (2.6) 


holds  for  l  =  c+1.  Then 


*  jJi 


r  ni  .  i  —  1  “■  .  —  [  r-.m  .  —  .  — I 

L,  (y.  -y.  n)w.  W.  =  (y.  -y.  )w.  W.  +  L  .(y.  -y.  )w.  W. 
b=c  i,-l  i,i,  l  l  ii-l  b=c+l  i_  l  i,  i-l 

bb  bb  cceo  bbbb 


wT1  jy.  -Ty .  +  Z  (y.  -y.  )w.  W?1  .11  w. 
VJllc  L  ’c  b‘c+1  xb  *b  xb  V1-!] 


+  Z*  .  (y. 

b=e+l  i 


.  -y.  )w.  w.1  .ri+w.  w.1  1 

1.  1.  1-  1,  -1  11-1 
bbbb  L  r*  p  -I 


=  W.  |y.  -6.  Iw.  +  Z  (y.  -  G.  )h  w. 
i  -11  1  lfi  b=c+l  i^  l.  b,c+l  i, 
c  I  c  c  c  b  b  b 


[i  +w.  w?1  .  ] 

1  1  -J 

C  C 


[y.  -6.  ]h  w.  +W.  •  W.  .  •  L  ..(y.  -6.  )1l  w. 

i  i  cc  i  i  i-l  b=c+l  i,  i,  b,c+l  i 
ccccc  bb  b 


c  c  c  c  c 


=  IT  (y.  -6.  )h  w. 
t)~  c  J  i,  l,  b  ,c  l, 
b  b  b 


using  (2. 7), (2. 5),  the  induction  hypothesis,  W.  ^  =  h  and  the  fact  that 
_]_  c 

li  •  W.  •  W.  ,  =  h.  .  Since  (2.6)  holds  for  all  JL,  the  theorem  is 

b ,c+l  i  i  -1  b,c 
c  c 

established. 

We  now  show  a  remarkable  property  of  the  solutions  given  by  (2.*+)  and 


(2.5).  Since  6 


-1  v 1  c~ 1_  _  S 


.  =  W  _  -Z  c  w .  G . , 
i  i-l  .1=1  -i  J 


c  c 


Z ,c. 0,W  =  W.  i0.  + w.  0. 

j=1  j  j  l  -1  l  ii 
°  °  c  c  c  c 


V.  0, 
i  i 
r  r 


(2.6) 


=  W.  &.  +  E™  .  ( y .  -y.  ,)w.  W71! 

\.L\.  b=c+1  'b  hr1  1i,  ij 


8 


by  (2.5).  Suppose  i  f  {i  ,t  , •  • •  ,i  J  and  also  that  i  is  the  largest 

12m  c 

element  of  £i  ,i J  such  that  i  <  i.  Then,  using  (2.8)  and  (2.1*) 
±  d  m  c 


0  =  W_1  Li_1  §  w 

i-1  i-1  j=l  j  j 


"£i[L&  *&**£*!  Vj] 

■  v?4\  k + *  e5^c«  “tj 


(2 


^,m 


+  IT_  .,(y.  -y.  ,)v.  w?1  •  [w.  -w.  ]5 

D=C+1  1.  l.  -1  1,  1,  1-1  1 

b  b  b  b  c  J 


d  b  b  b 


Thus,  comparing  (2.k)  and  (2.9),  we  see  that 


®i-i  >  (<)  ®i  iff  yi-i>(<)yi> 


regardless  of  i-,i0,*,,,i  .  It  follows  that  if  we  wish  to  find  the  MLE  of 
12  m 

the  vector  0  subject  to  the  constraints 

0^  2  (s)  Ga  (equivalently  0  i  (s)  6^^ ) ;  i=l,2,***,k 

then  we  know  that  the  i—  constraint  6.  =  0.  needs  to  be  imposed  if  and 

l-l  1 

only  if  yi_1  <  (>)  y  (see  Barlow  et  al.  (1972),  page  89). 

Compared  to  other  restricted  optimization  problems,  this  is  a  note¬ 
worthy  property  and  it  allows  us  to  write  the  solutions  in  a  concise  form. 


Adopting  the  standard  notation,  a  =  max{n,OJ  and  a”  =  min[a,o3  we 
have  established  the  following  theorem  which  generalizes  the  work  of 
Shaked  (1979). 


Theorem  2.2.  The  MLE  of  the  vector  6  =  (6  ,6  , ••• ,6  )  subject  to  the 

AC  .K 

constraints  6._^  (&)  (  =  )  (s)  0. ;  i  = 1  ,2," - '  ,k  is  given  by 


Vi  ..-1 


where 


(  *.  ,V. )  = 
i  l 


(max(yi ,y  ^ ) ,  +) 

(y1 i  1) 

(min (y . ,y  ) ,  -) 

j-  J- 


if  the  i —  constraint  is 


For  example,  if  the  restrictions  require  that  8  is  DAL,  then 


h  “  "j  “J1- 


If  h  is  some  integer  (l  <  h  <  k)  and  our  restrictions  require  that 

W",1W1S,,,4V  then 


6  =  min(y  ,y  )  +  X.  .,  ( y ,-y .  )  w.W.  for  i  &  h  while 

1  i  i  j=i+l  j  j-1  j  j 


8  =  max(y  ,y  )  +  X?  (y  -y  )+w.wT1  for  h<j  sk. 

1  11  +  J  —  -L  ,]  J 


The  MLE  of  6  subject  to  the  requirement  that  6  is  lower  starshaped 
(8  is  DAL  and  6^  ^  0) ,  say  6,  is  easily  expressed  in  terms  of  the  0 
given  in  (D.JO).  Tn  particular. 


10 


*0. 


S. 


if  yR*0 
if  yk  <0. 


The  proof  of  this  is  a  straightforward  verification  of  the  properties  char¬ 
acterizing  projections  in  Theorem  7.8  of  Barlow  et  al.  (1972). 


3.  RESTRICTED  HYPOTHESIS  TESTS.  Consider  the  problem  of  testing  the  null 

hypothesis  H^:  8^  =  60  ~  • • •  =  8R  when  the  parameter  vector  8  is  known 

to  be  DAL.  In  other  words,  test.  against  the  alternative  H^-H^  (H 

but  not  H.)  where  H, :  6,  ^  8.  £  • • •  s  0  We  consider  a  likelihood  ratio 
0  112  k 

statistic  which  has  a  surprisingly  tractable  distribution. 

It  is  well  known  that  the  MLE's,  under  ,  are  given  by 


*0. 


ZK  .  W.Y. 
1=1  i  i 


(free  of  O  );  i  =1,2,- 


,k. 


and 


(Y.-Y)2 


a. 

l 


2 

Since  the  MLE  of  the  vector  G  which  satisfies  does  not  depend  on  C  , 

it  follows  that  the  MLE 1  s  under  are  0,  as  derived  in  Section  2  and 


02 


(Y.-g.g 
1  1 


a. 

l 


The  likelihood  ratio,  A,  is  then  /riven  by 


Now 


11 


4  =  Zk  .(Y.-yf  w. 

O  J=l  1  1 


=  Lk  AY.-l.fv.  +Ik  ( G,  -Y  )‘w.  +?Ik  .  (Y.-O.  )(6.-Y)w..  (3.1) 

i=l  ii  i  i=l  i  i  i  =  l  lii  i 

The  class  of  vectors  which  satisfy  H ^  form  a  closed  convex  cone 
which  contains  all  the  constant  functions.  The  vector  6  is  the  weighted 
least  squares  projection  of  the  vector  Y  onto  this  cone  so  that 

Zk  .(Y.-SjG.w.  =  Lk  _(Y.-§.  )Y  w.  =  0 

(see  Barlow  et  al .  (1972),  page  318).  Thus,  the  last  term  in  (3.l)  is  zero 
and  a  likelihood  ratio  test  rejects  in  favor  of  for  large  values  of 


Q  =  1  -  A 


2/k  £.•  w.- 


i=l  i 


Ik  .(Y.-Y)2w. 

i=1  l  l 


(3.2) 


Suppose  that  i,, !,•••,!  are  those  indices  where  G.  =  0.  when 
12m  l-l  l 

A  _  __ 

6  is  obtained  (i.e.,  those  indices  for  which  Y^  ^  £  Y^).  Using  (2. it)  and 
(2.5)  we  can  write 


k  (Y.-G.  )2-w.  =  L1"  |"y.  -Y.  -  I™  (Y  -Y.  )-w  W:1!  w. 

i=l  11  1  a=l  1  1  b=a+i  i,  i,-l  1.1.  1 

L  a  a  b  b  b  b-J  a 

+  Im  Tl™.  (Y.  -Y.  )w.  wT1!  (W.  -W. 

a=1L  h~a  lb  V1  3b  XJ  V1  1a-l 

=  Zm  Fy.  -y.  -s  1  W.  +  Lm  s2(w.  -w.  ) 

a=l  1  1  a+1  1  a=l  a  1  -1  1 

L  a  a  -*  a  a  a-1 


Lm  i  —  .  —  j 

,  (Y.  -Y.  . )w.  -W.  .  Adding  the  first  term  from  each  of  the 

a  b=n  i  l-l  i  ) 
b  b  b  b 

above  sums  arid  using  (2.Y), 


-1 


we  obtain 
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[  (Y.  -Y.  JW.  ,W?  -Sj  v,  +  [(Y.  -Y.  )v.  •W“1+S0]  -W.  . 

ii  v1  v1  *i  2  ‘i  *i  'r1  *i  *i  2  v1 


*  (Y.  -Y.  ..  )  v,  W.  +  W.  S_. 

\  v1  ii  v1  Ti  h  2 


The  term,  W  S  ,  cancels  part  of  the  second  term  of  the  second  sum.  Pro- 
il  2 

ceeding,  using  similar  reasoning  we  obtain 


r,k  .  g  ,2  ,-m  „  —  . 22  _1 

L,  ,  Y.-G.w.  =  I  . (Y.  -Y.  )  w.  W.  W.. 

1=1  ii  i  a=l  l  l-l  l  l  -1  l 

a  a  a  a  a 


(3.3) 


A  fairly  simple  induction  yields 


(3.M 


IL(VY)%  * 


Using  (3.3)  and  (3-1*)  in  our  expression  for  Q,  we  obtain 


L.aj  zf 
q  * 

I.  z2 

1  1 

where  1=  £i  ,i  ,  • • •  ,i  ]  and  Z  =  (Y  -Y.  n )w*/2  wj/2  W*/2 ;  i  = 2  ,3 ,  • • *  ,k. 

1  d  m  ii  l—i  l  l—i  i 

Under  H  ,  Z0,Z  ,•••,!  are  independent,  standard  normal  random  variables 

U  j  K 

so  that  for  a  given  set  of  indices,  I  =  £i^ ,i^ , *  *  * ji^J ?  Q  bas  a  beta 
distribution  with  parameters  (k-m-l)/2  and  m/2. 

Now,  suppose  I  ci  {2,3,***,k}  =  1^  and  that  E^  is  the  event 

p 

£z^  k  0;  i  (I  and  Z  <0;if?l].  If  we  let  ,  then  we  may  write 


tt 

p[q  2  t.Ejl  =  rj^— q  t  1  Er J  p(Er! 


=P[B(k-„-l)/P,m/P£t,-(1/2) 


since  Tt  ./T.  is  independent  of  E  . 
I0~  0 


13 


If  we  partition  the  event  fQ  ^  t]  by  intersecting  it  with  all  such  events 
Ej  and  then  collect  terms,  wc  obtain  the  following  theorem. 


Theorem  3.1.  In  testing  H  :  6^  =  0O  =  •  ■  •  =  6^  against  the  alternative 
H  -Hq  where  specifies  that  the  parameter  vector  8  is  DAL  the  like- 

O  f\r  —  \r  **  _  P  v  _  p 

lihood  ratio  statistic  Q  =  1-A  '  =  (L.  , ( 6. -Y)  w. ) /(Z .  , (Y  -Y)  w.  )  has 

i=l  l  i  i=l  i  i 

a  null  hypothesis  distribution  given  by 


p[q  *  ti  =  £ 


k-i/k-iA 
m=0\  m  ) 


(1/2) 


k-1 


P^B(k-m-l)/2,m/2'i  ^ 


(3.5) 


for  all  t,  where  II  Q  denotes  a  standard  beta  random  variable  with 

parameters  a  and  0  and  B  a  (B_  _)  is  taken  to  be  degenerate  at 

0,p  a,0 

0  (l)  when  8  >  0  (a  >  0). 


We  note  that  the  distribution  of  Q  given  in  (3.5)  is  exactly  the 
same  as  the  distribution  of 


Ik"J(Z.AO)? 
1-1  1 _ 

r.k-J 

1=1  i 


(3.6) 


where  are  independent  standard  normal  random  variables. 

If,  in  fact,  is  not  true  then  we  can  replace  Y^  by  Y^-6^  in  our 

expression  for  Q  and  say  that  Q  is  distributed  as 


.k-1 


L.  t(z.+6. )  AO i 

1=1  1  1 

r.k_! (z.+6. )" 

i  =  1  I 


Lk  h(Z.+6. )  AO]2 
i  =  l  i  i 


.K-|[(z.+t. )  AO]2  +  Lk_?f(Z.+6.)  VO]2 

I  - 1  11  1  =  1  11 


(3.7) 


where  Z^ ,Z, 


,Z 


k-1 


are  independent,  standard  normal  random  variables  and 


Ik 


6  =  (0  _e  U1/2  w1/?  w~l/2'  i  =1  2  • 

i  1  i+1  V  i+1  i  i+I  *  1  1,2  ’ 


(3.8) 


Note  that  if  the  parameter  vector  0  satisfies  then  6^  ^  0;  i  =1,2, 

Moreover  if  6,  ^  0  then  [(Z.+6^)  AO]2  ^  fZ.  Ao]2  and 
2  p 

[(Z^  +  6^)  Vo]  s  [z.  Vo]  .  This  implies  that  (3*7)  ^  (3.6)  and  shows  that 

our  likelihood  ratio  test  is  unbiased. 

Of  course  the  same  distribution  theory  holds  for  testing  :  6^  = 

=  =6  against  the  alternative  H  -H  if  H  specifies  that  6  is 

IAL,  IAR  or  DAR.  It  is  perhaps  somewhat  surprising  to  note  that  we  may 

intermix  the  inequality  signs.  For  example,  the  same  distribution  theory 

would  hold  for  testing  against  H^-H^  if  specified  that 

0.  <  0O  s  • • •  s  6  ^  6  s  ...  a  fl  for  some  value  of  h. 

12  h  h+1  k 

2 

If  o  is  known,  a  likelihood  ratio  test  of  KQ  against  I^-Hq  would 
reject  for  large  values  of 


R  =  -2  In  A  =  L*  (  0.  -Y)2w.  . 

1  =  1  1  1 


The  distribution  of  R  is  the  same  as  that  of 

^i=i[(W (3-9) 

where  Z,  and  6.;  i  = 1 ,2 , • • *  ,k-l  are  defined  as  before.  If  the  null 
i  l 

hypothesis  is  satisfied  then  6^  =  0;  i  = 1 ,2 , • • •  ,k-l  and  if  is  satis¬ 

fied  then  6.  i  0;  i  = 1 ,2 ,* • •  ,k-l.  Thus,  the  test  is  unbiased  and  the 
null  hypothesis  distribution  of  R  is  given  by 


(3.10) 
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where  denotes  a  standard  chi-square  random  variable  with  m  degrees 

o 

freedom  ( ^  =  0 ) . 

2 

Consider  the  problem  of  testing  H  as  a  null  hypothesis.  If  0 

is  known  then  a  likelihood  ratio  test  rejects  II,  for  large  values  of 

1 

R7  =  L .  ( Y , -6. )"  w . . 

i=l  ii  i 

The  random  variable  R'  is  distributed  as 


Ik"^[(Z.+6.  )  VO]2 

1=1  l  l 


(3.11) 


and  6.  s  0  if  is  true.  Thus  [(Z^+6. )  vo]^  &  [Z^Vo]2  and 

pe[R' 2 1]  *  ph0[r'  2 1]  ■  z»:o(k;1)(1/2,k'1  p(^  2  *i-  (3-12) 

2  2 

If  o  is  unknown,  we  cannot  estimate  0  in  the  denominator  of  the  like¬ 
lihood  ratio  since  we  have  only  one  item  from  each  population.  Thus,  in 
this  case  we  cannot  construct  a  likelihood  ratio  test. 

Assume  we  have  a  random  sample  of  size  n  from  each  of  our  k  normal 
populations  and  let  X^  denote  the  mean  of  the  items  of  the  sample  corre¬ 
sponding  to  the  population  with  mean  6^;  i  =l,2,***,k.  The  maximum  like¬ 
lihood  estimators  sub ject  to  the  constraints,  (2.1),  are  given  by  (2.M  and 
(2.5)  with  y  replaced  by  X  - 

V  il 

2 

If  0  is  unknown,  a  likelihood  ratio  test  rejects  for  large  values  of 


1 6 


_1  rk  -  —  — 

where  X  =  k  L.  ,  X.  .  Since  the  X..  -X.  are  independent  of  the  X. 

i=l  l  lj  l  l 

and  since  Lj_^(X^-X)^v^/n  has  a  chi-square  distribution  with  k(n-l) 

degrees  freedom  it  follows  that  Q  has  the  same  distribution  as 


Li=i[(zi+6i)  A°r 


1=1  i  i  i=k  i 


N-l  „2 


where  N  =  k*n,  .Z^ , • • •  ,Z^  ^  are  independent  standard  normal  variables, 
and  is  defined  by  (3.8).  The  following  theorem  is  a  consequence  of 

this  representation. 


Theorem  3.2.  The  likelihood  ratio  test  of  HQ  against  H_^-H0  based  upon 
Q  is  unbiased  and  the  null  hypothesis  distribution  of  Q  is  given  by 

2 

If  o  is  known  then  the  likelihood  ratio  statistic  R  =  -2  Jn  A 
gives  rise  to  an  unbiased  test  and  its  null  hypothesis  distribution  is  given 
by  (3.10). 

O 

In  testing  as  a  null  hypothesis  when  o''  is  known  the  likelihood 

ratio  statistic  R  =  -2  In  A  =  L  .(X.-6. )  -nw,  has  a  distribution  which 

i=l  ii  i 

may  be  represented  as  in  (3.11).  Thus,  the  null  hypothesis  distribution 
is  as  in  (3.12). 

If  the  common  sample  size,  n,  is  larger  than  one,  we  can  test 

2 

as  a  null  hypothesis  when  O  is  unknown.  In  testing  against  ~H^, 

reject  for  large  values  of 
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Q'  =  1-A‘ 


2/N 


Zk  ( X.  -6.  )"  -nw. 

i=l  1  j  l 


Zk  ,  Ln.  ,(x.  ,-X.)  w.  +  Zk  (X.  -G. )‘  -nw.  +  Z  AB.-xf-  nw. . 
i=l  j  =  l  lj  i  i  i=l  ,i  i  i=l  i  l 


(3.13) 


It  can  be  argued  that  Q  is  distributed  as 


zk  h(z.+6. )  vor 

1=1  i  i 


zk-;(z.+6.r  +zN_?  z'? 

i=l  i  i  i=k  i 


where  Jnt.  is  defined  by  (3.8). 

i 

The  following  theorem  is  a  result  of  this  representation. 

Theorem  3.3.  A  likelihood  ratio  test  of  against  <-'H^  rejects 

for  large  values  of  Q*  (as  given  in  (3.13))  >  is  unbiased  and 


sup6CH1  P[Q'*t!  ■  V5'*11  =^o(k;1)(l/?)k‘1  *  tl 


h.  CONCLUDING  REMARKS.  If  we  wish  to  replace  or  by  "="  in 

the  restriction  imposed  by  our  alternative  hypothesis,  ,  the  appropriate 

distribution  theory  may  be  found  from  the  results  in  Section  3  by  appropriate 

adjustments  in  the  degrees  of  freedom. 

We  conducted  a  Monte  Carlo  study  of  the  power  of  the  test  statistic, 

R  =  Zk_^( 0.-Y)^w. ,  for  testing  against  H^-H^  (H^  :  0  is  DAL)  when 

2 

O  is  known.  Some  of  the  results  of  that  study  are  given  in  Table  1.  In 

2 

this  study  we  let  k=5,  a^=l;  i  =1,2,  • '  *  ,5  and  0  =1.  We  approximated 
the  power  of  each  of  three  test  statistics  at  each  of  28  parameter  vectors, 

6.  This  was  done  by  randomly  generating  20D0  5-tuples  of  normal  random 
numbers  where  the  i~  entry  in  the  5-tuple  has  a  normal  distribution  with 
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mean  8^  and  variance  1.  The  entries  in  the  table  are  the  fraction  of 

times  the  test  statistic  exceeded  the  .05  critical  value  computed  from 

its  null  hypothesis  distribution  (x^  for  X  ,  using  (3.10)  for  R  and 

_2 

using  Theorem  3.1  in  Barlow  et  al.  for  X  )•  Corresponding  to  each  pnram- 
eter  vector  we  have  given  its  spacing  and  a  measure.  A,  (A  =L^_^(G^-G)  ) 
of  its  distance  from  the  null  hypothesis  HQ. 

The  first  thirteen  6  vectors  in  the  table  are  decreasing  and  the 

—2  2 
power  of  X  is  significantly  greater  than  that  of  either  R  or  X  • 

However,  for  these  vectors  R  is  significantly  more  powerful  than  is  \  . 

The  last  fifteen  0  vectors  are  DAL  but  not  strictly  decreasing.  Here  R 

2  —2  2 

is  significantly  more  powerful  than  either  X  or  X  •  Note  that  X  is 

_2 

more  powerful  than  X  for  the  last  six  0  entries. 

We  have  been  unable  to  substantially  relax  the  assumption  of  equal 

sample  sizes  in  Sections  2  and  3.  In  other  words,  this  analysis  depends 

very  heavily  on  the  assumption  that  the  weights  in  the  restriction  imposed 

by  are  proportional  to  the  variances  of  the  sample  means.  If  we  relax 

the  assumption  of  equal  sample  sizes  we  must  be  willing  to  use  weights 
2  —1 

w^  =  n^(a^o  )  in  our  restriction,  H^.  This  latter  approach  is  the  one 
taken  in  Shaked  (1979). 

A  different  definition  of  decreasing  on  the  average  can  be  found  in 
the  work  of  Robertson  and  Wright  (I98l).  They  define  the  parameter  vector, 
6,  to  be  decreasing  on  the  average  (DA)  if  i  ^  (k-i)  1  ’ 

i  = 1,2,  • • • ,k-l.  If  8  is  DAL  or  IAR  then  0  is  DA,  in  the  above  sense. 
Robertson  and  Wright  ( 198.1 )  derive  maximum  likelihood  estimates  of  a  vector 
of  normal  means  subject  to  the  restriction  that  it  is  DA  and  discuss  testing 
homogeneity  of  8  when  it  is  assumed  to  be  DA  and  testing  DA  as  a  null 
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2  —2 

Table  1.  Power  Functions  of  X  ,  X  an(l  R. 


(ei’e2'e3'W 

Spacing 

A 

Power 

of  X 

Power 

—2 

Of  x 

Power 

of  R 

(0.0, 0.0, 0.0, 0.0, 0.0) 

1,1,1, 1,1 

0 

.061 

.01+7 

.058 

(0.1»,0. 3,0. 2,0. 1,0.0) 

5, 1+, 3,2,1 

.  316 

.06a 

.083 

.081+ 

(1.0, .75, 0.5, .25, 0.0) 

5, 1+, 3,2,1 

.  791 

.089 

•  173 

.11+7 

(2. 0,1. 5, 1.0, 0.5, 0.0) 

5,1+ ,3,2,1 

1.58 

.206 

.1+59 

•  381 

(U. 0,3. 0,2. 0,1. 0,0.0) 

5, !+,  3,2,1 

3.16 

.71+3 

.913 

.871+ 

(1.0,  9, .8, .7,0.1) 

10,9,8,7,1 

.707 

.082 

.11+6 

.118 

(2. 0,1.  -5, 1.6. 1.1,0. 2) 

10,9,8,7,1 

1.1+1 

.172 

.359 

.286 

(5. 0,1*. 5,1*. 0,3. 5, 0.5) 

10,9,8,7,1 

3.51+ 

.838 

•  950 

.926 

(.!. .I..!,. 1,0.0) 

2, 2 ,2  ,2,1 

.089 

.059 

.052 

.060 

(.2, .2, .2, .2, 0.0) 

2, 2, 2 ,2,1 

.179 

.061 

.056 

.061+ 

(.5,.  5,  .5,  .5,0.0) 

2, 2, 2,2,1 

.1+1+7 

.073 

.081 

.079 

(1.0, 1.0, 1.0, 1.0, 0.0) 

2,2  ,2  ,2,1 

.891+ 

.101 

.1 66 

.126 

(5- 0,5.0, 5.0,5. 0,0.0) 

2,2  ,2  ,2,1 

1+.1+7 

.967 

.99? 

.981+ 

( .3,0.0, .1,0.0, .1) 

fc  ,1,2,1 ,2 

.225 

.061 

.060 

.071 

(.6, 0.0, .2,0.0, .2) 

1+  ,1 ,2 ,1 ,2 

.1+90 

.071 

.083 

.101 

(1.5, 0.0, .5,0.0, .5) 

,1,2 ,1,2 

1.22 

.11+ 1+ 

.197 

.225 

(3. 0,0. 0,1. 0,0. 0,1.0) 

U  ,1,2 ,1,2 

2.1+5 

.1+71 

.583 

.653 

(6. 0,0. 0,2. 0,0. 0,2.0) 

U ,1,2 ,1,2 

1+.90 

.988 

•  989 

.991 

(.U,. 3,0. 0,.l,. 2) 

(5  ,1+  ,1,2,3) 

.316 

.063 

.  066 

.079 

(.8, .6,0.0, .2,  .1+) 

(5,1+  ,1,2, 3) 

.632 

.075 

.098 

.110 

( 2.4 ,  1.8, 0.0, .6, 1.2) 

(5,1+,1,2,3) 

1.90 

.296 

.397 

.1+1+6 

(l+. 8, 3. 6,0. 0,1. 2, 2.U) 

(5,U,1,2,3) 

3.79 

.880 

.913 

.91+6 

(.2,0.0,.l,.l,.l) 

(3, 1,2 ,2, 2) 

.ll+l 

.059 

.052 

.063 

( .U, 0.0, .2, .2, .2) 

(3, 1,2 ,2 ,2) 

.283 

.  o6l+ 

.058 

.070 

(1.2, 0.0, .6, .6, .6) 

(3, 1,2, 2, 2) 

.81+9 

.099 

.095 

.128 

(2.1+ ,0.0,]  .2, 1.2, 1.2) 

(3,1,2  ,2  ,2) 

1.70 

.21+1+ 

.221 

.329 

(U. 8,0. 0,2.1+ ,2.1+ ,2.1+) 

(3, 1,2, 2,2) 

3.39 

.795 

.696 

.867 

(7. 2,0. 0,3. 6, 3. 6, 3. 6) 

(3, 1,2 ,2,2) 

5.09 

.991+ 

.969 

.997 

hypothesis.  They  assume  that  0‘  is  known  and  consider  likelihood  ratio 
statistics  whose  null  hypothesis  distributions  are  chi -bar-square  distri¬ 
butions. 

The  maximum  likelihood  estimates  of  mortality  rates  discussed  in  Sec¬ 
tion  1  subject  to  the  restrictions  that  they  are  IAL  and  DAR  are  given  in 
Figures  2  and  3.  Surely,  actuaries  would  feel  that  either  one  of  these  esti¬ 
mates  requires  additional  smoothing.  However,  they  both  give  an  indication 
of  the  "bump"  at  age  20  and  either  one  {or  their  average)  might  provide  a 
better  starting  point  than  the  crude  mortality  rate  or  its  "isotonic 
regression"  (which  is  oversmoothed)  for  the  graduation. 
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Figure  2.  Mortalities  Smoothed  to  be  IAL. 
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restriction  ^'decreasing  on  the  average^  is  less  restrictive  than  the  ususal 
monotone  restriction  and  allows  the  data  to  give  rise  to  ^'reversals)'  over 
short  ranges  of  values  of  the  parameter  set.  It  is  closely  related  to  the 
^'starshaped  ordering*'  restriction  discussed  in  Shaked  (Ann.  Statist.  (1979)). 
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